DOCOHENT HESOB 



ED 110 0t»5 

AOTHOS 
TITLE 



INSTITOTION 

POB DATE 
NOiTE ' 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIER? , 



Of 



. ^ IE 002 320 

Bierscherik, Bernhard 

A Computer-Based Content Analysis of Ipt-erview Data:' 
Some Problems in\he Construction and Application of 
Coding Rules. ^ 

School of Ediication, Malmo (Sweden). Dept. 
Educational ^d Psychological Research. 
Noy 74 

SUp.; Special-topic Bulletin number 45 

MF-$0-76 'hC-$1.95/ PLDS POSTAGE 
^Computational" Linguistics ; Computer Oriented 
Programs; ^Computer Programs; *Content Analysis; Data 
Analysis^ *Data Collection; Data Processing; Flow 
Charts; Information Processing; *Interviews; Item 
Analysis; Pilot Projects; Researchj Statistical 
Data 

♦ANACONDA; Analysis Of Concepts By Data Processing; 
^eden; UNIVA-C 



t 



ABSTRACT ' ^ ^ 

The development of a techrfigue for a .computer-based 
content analysis ©f interview data is 'described . A preliminary 
version of ANACONDA (ANAlysis of CONcepts by DAta-processing) is 
presented^ and empirical results are shown from the application of 
the technique by independent coders to test material. Proposed 
modifications and extensions of the system are also discussed. 
(DGC) 



:«e :«e :«e :«e :(e :«e :«e :te :te 34c :(e :(e :(e 39c 39e :(e :(e :(e 39t :(e 39e 39e ^ 39e ^ ^ ^ 3^ ^ 3^ 

* Documents acquired by ERIC include many informal unpubl 

* ma1;erials not available from other sources. ERIC makes ever 

* to obtain the' best copy available, nevertheless^ . items of m 

* reproducibility are often encountered and this affects the 

* of the microfiche and hardcopy reproductions ERIC makes ava 

* via the ERIC Document Reproduction Service (EDRS) . EDRS is 

* responsible for the quality of the original document. Repro 

* supplied by EDRS are the T3est that can be made from the ori 

3|e 3|e 3|e 3|e 3|e 3|e 3|e 3|e 3te 3|e 3|e 3|e 3|e 3|e 3|e 3|e 3|e 3|e 3|e 3|e 3(e :(e 3|e 3«t 3|e 3«e 3«e 3«e 39e * 



ished' * 
y effort * 
arginal^ * 
quality * 
liable * 
not * 
^udtions * 
ginal. * 

39e3«c*3»(3(e39e39e3»e3te3»e3»e 



A COMPUTER-BASED CONTENT ANALYSIS OF INTERVIEW DATA: 
SOME PROBLEMS IN THE CONSTRUCTION AND APPLICATION OF 



CODING RULES' 
s 





Ber^ihard Bierschei\k 



U S> DEPARTMENT OF HEALTH. 
EOUCATtONAWILFARI 
NATIONAL INSTITUTE 0^ 

EDUCATION I 
THIS DOCUMENT HAS BEEN REPRO 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORiGiN 
ATING^T POINTS OF VIEW OR OPJNIONS 
STATED DO NOT NECESSARILY REPRE 
SENTOFFICIALNATIONAL INSTITUTE OF 
EDUCATION POSITION OR ^^ICY 



Bierschenk^ B. A computer-based content analysis of interview data: Some 
problems in the construction anS. application of coding rules. Didakometry 
(Malmd,^ Sweden: SpHool of Education) No. 45, 1974. 

"^his reJxSrt discusses the development of k technique for a coimputer- based 
content analysis, 'it .presents a flow-chart of different stages in the de- 
signing of an Ana lysis of Concepts by Data-processing. ,The acronym 
ANACONDA is the name that has been given to this technique. ^ condensed 
prreliminary version of ANACONDA ip presented and empirical results are 
shown from thfe application of the technique by indepejident coders to test 
material. The entire test material has been checlced^^in order to obtain a 
faultl0$»sly^ punched and ^oded text. Empirical data are presented from this 
check. In conclusion the next steps ^re discussed: (l ) scaling of qualifiers 
and (2) construction of registers^. 

Keywords: Data-proces6ing, concepf analysis, intercoder agreement, 
interview texts, psycholinguistics " 



- 1 



CONTENTg * , ^ - 

!• SOME INTRODUCTORY^OTES 

2. COLLECTION AND PROpESSING OF EMPIRICAL DATA 
.3. PRINCIPLES FOR CODING NATURAL LANGUAGE 

4. REPRESENTATION QF STATEMENTS 

4»1 Directions for writing out spoken text 

4. 2 lurking of text 

- 4. 3 Segmentation of text 

4.4 Th^ agent-action-object (AaO) pa^radigm 

4.5 Development of rules for formalization of text 
4. 6 Allotment of c64es to text 
4. 7 Supplementation qf text 

4.8 Selection of coders and choice of criteria for inter- 
coder agreement „ ^ 



6. 



5. 1 ^ Intercoder agreement 
5. 2 Control of pynch ca^ds : 

STRUCTURED REGISTERS 
6.1 Facets 



.Page 

•3 
4 



8 
8 
8 

8 

n 

12 

18 

18 



5, INTERCODER AGREEMENT: SOME EMPIRICAL RESULTS 21 



21 
23 

26 
28 



'7. REFERENCES 



30 



• . ; 2 - 
f ' 

1. SOME INTRODUCTpRY NOTES 

When one Has a complex verbal material with a low degree of structuriza- 

tion^ a goa^l-oriented analysis becomes extremely onerous. As a result, 

researchers often try to avoid this situation by constructing "assessment 

• ' ' -. ' ' ' • 

scales'' containing statements or Questions with fixed .alternative answer,s. 

They ^re easier/ to handle from the processing y>int of view, even though 
the gathering of information in the fo^m of e.g. an interview would have 
been more suitable. Manual analyses of verbal data ^re in addition often 
deternnined in far too great a degree by practical considerations rather ' \ 
than by what is desirable from a scientific viewpoint. The ^esulV'of such * 
an approach is that the analysis is usually limited by simple frequency 
comparisons*,, v/hich m^ans that much information that is relevant for the 
investigfation is lost* In order to avoid such rough analyses, the.- researcher 
should create quantities oi data with a high degree of structurization. 

The computer* can be used for the handling and processing of text. But* 
an autornatization of text analyses presupposes a coding of text and com*- 
puter programs that direct the processing o^f natural lan^age. Content 
analysis techniques based on the, use of computers have/be^en developed 
within widely separated fields (see Gerbner, Holsti, KrippendorfiE, 
Paisley & Stone, 1969). If computer-based analysis techniques are de- 
veloped, the content analysis process will become flexible and the technique 
can thereby be used for processing large quantities of verbal data. In 
addition it becomes possible to refine the analysis technique so that 
better statistical models can be used than has so far^been the case. 
Computer-based analysis techniques permit a greater .dispersipn of in- 
formation in complex material that is difficult to survey.^ This in its turn 
produces much'more detailed analyses than a manually conducted ana- 
lysis would allow. But at the same time automatic stor^age and processing 
of text presupposes that the storage takes place in aj:cordance with a 
. given format. The forrtiat states how each individual element shovdci be 
stored so that different elements can be placed in relation to each other 
by means of e. g. Boolean algebra. 



5 



2. COLLrECTlDN AND PROCESSING OF EMPIRICAL DATA 



The problem of gaining access to and a technique for computer-based 
content analysis has featured largely in a research project on search and 
steexing strategies in educational and* psychological research planning. 
This project is financed by the Swedish Board of Education, The work 
within this project was, initiated with an interview study an volving forty 
randomly selected researchers working in departments of educational 
cind psychological research in Sweden (see B, Bierschenk, 1974), The 
material was collected during May, 1973, The interviews were recorded 
on audio -tapfe. The recorded interviews have then been writjfcdn out by 
secretaries, but without the use of zxiy phonetical, transcription. The 
total irraterial now comprises approximately 4,000 pages of text,, 
measuring 21x29 cm. These are first to be prepared manually by the 
insertion of syntactical codes, !so that the interview data can tl>en be 
processed,by the computer UNIVAC CD 3600, v^dthout the,afiginal struc- 
ture of the text being lost. ' ' 

In order to be able to carry out an Ana lysis of Conc epts by Dat a- 
processing. (ANACONDA), rules have been worked out for the manual 
coding and a condensed pf eliminary version of ANACONDA will be pre- 
sented in this report. Empirical results of application of ANACONDA to 
test material by independent coders are pre^sented* Furthermore, a 
check of the punching and the punch cards has been carried out' in order 
to attain perfect material and the control is described in this report. 

The check of the intercoder agreement in the application of ANA* 
CONDA has been carried out by Berg and the results are rjeported in 
detail in Berg (1974), The construction of rules for ANACONDA has 
been worked gut by the linguist I. Bierschenk, A preliminary version and 
some evaluation data aore presented in I, Bierschenk (1974). 



3. PRINCIPLES FOR CODINq NATURAL LANGUAGE ^ • • 

The text material of the interviews must be formalized, which means 
breaking it down inp> smaller units such as clauses, phrases, groups of 
words or single words, .This does not mean, however, that this analysis* 
is based on syntax, hyxt it is instead based on the conceptual context (mo- 
dels^, images) underlying an utterance. The conceptual basis is the meaning 
of an utterance. We assume that -there'is a covered structure, namely the 
speaker's thought (what he wishes to say) and an uncovered structure (that 
which in reality is said and heai-d). If an utterance is to have "meaning", 
it must be comprehensible, which need not be the same'as being gjt^amma- 
t'ically correct. 

Seen in this way, an utteran^^e consists' of a conceptraaJization. The , 
basic unit of the conceptualization is the concept. A single word can hav<^ 
meaning (e,g, represent a conceptualization) when it is \?ttered in a par/ 
ticular situation, in a particular context to a particular person and It t^en. 
forms a concepU , ^ ^ 

In this context it can also be of interest to* mention that in principle. 
»the generative theories adopt .the' same point of view (cf. Mttler, 196^, 
p. 67). According to Schank . (19 72, p., .555)-, there,are two elemental kinds 
of concepts. A' conce.pt can be ei^:her (1) independent or (2) dependent. By 
independent concepts are m^ant all that are ipter pre table in isolation. By 
dependent concepts are meant attribute s,or modifiers. Components wi^h 
independent meaning are e,'g. hominals (subject, bbject) and actions, often 
expressed in verb form. Dependent concepts only become meaningful' 
through the concepts that 'arc modified by means of the dependent concepts. 
From the view point of psychology, an analysis of natura! language mean-s 
that the elements of the analysis iarc regarded as concepts and not as words 
In his chapter on the "Identification of conceptualizations underlying natural 
language" Schank, (1 973, p; 192) now speaks of "three elemental kitids of \ 
concepts*. A concept can be either nominal, an action ar a modifier". The- 
independent concepts are those that alone produce conaeptualizations. 
Schank calls these "picture producers". In other words, they are different 
kinds of nominal s. While Schank (1972, p. "Sbb) alscJ considers verbs to be 
independent concepts, he has changed Kis position in 1973 (s^e Ch^p. 5, 
'p. 19 2) where he writes: "An action is what a norriinal can be said to be 
doing". Verbs are cl^ss^fied now solely as expressions of action towards 
an object or goal. ' . * 

1 Language is an expression of proces^s (actipns, events, conditions and 
1 clationships and associated persons, objects, and abstractions). This- 



process takes place within k struqturet the clause.. The process' itself is^^? 
repres'ented by the verb^ Participators- in the process, are e.g. pers^s and 
objects^ They take the ;role of agent-^hd goal. TMs role -playing 'in i^latit^n 
to .the verb is called transitivity. - ^ r . ' 

This meSns that we cannot extract information from a tJext if we only 
work wi|h individual words. When peo'ple utter a thought, this takes place * ' 
as ect)ndmically as the situation permit^. In a dialogue betwe^j^ e.g. , ' 

researchers with a common frame of reference,^ the re^searchers can easily 

• ' ♦ * .' 

communicate since they mal^e use of such verbal representation that the 

same conceptualization is produced in both parties ^involved (cf. MilleJp, 

1969, p«;67)* Thus, the interview text concerned forms a manifest ou'tput 

of what is indicated by the speaker. 

By conceptualization is meant the individual s use of certain i^l^s for 
relating concepts, e.g; grammar. Conceptualizations maybe simple or 
complex. In this way an utterance in an intervieAv situajfion can be rich in 
simultaneously underlying conceptualizations and make it difficult to re- 
present these in a sentence. Consequently a sentence in a text can contain 
many completely expressed ideas and idea relations. The condensed infor- 
matitfn^ which is a result of the inherent econanay in clause -linking thus only 
becomes available in a content analysis by means of a supplementation pro- 
cedure, ^ut in order that Ve may carry put our analysis, a starting point 
i$ needed from wh^h we cam build up the structure in an utterance. In this 
analysis we begin with^'the action or the verb. Ap action can b.e said ,to be ^ 
something that an ^^ageiit^^ can achieve in relation to'an '*objject^\ Agent is 
used in the sense ^'action centre'* and cd>ject consists of the means or the 
goal of an action. In principle only two casea ex^lst, namely (l)'agent and 
object coincide and (2) agent and object consist of two separate units. Ivlodi- 
fiers describe the attributes (qualities, relations) that characterize agent, 
object or action. For this reason adverbials wiU be grouped arojind the 
verb, while different attributes (qualifying and describing) are^atranged < 
around subject and object. 

Early attempts at designing computer programs that used natu'itral 
language were primarily concerned with syntactic analysis. T^day re- 
searchers in the\field of psycho-linguistics no longer maintain that syn- 
tactic analysis is essential for the devel opment of computer programs. 
Some have said that it is perhaps not at all necessary t^ use syntactic'ana- 
lyses (see Schank, 197 2,* p. 555). 'But this^ hot to say that syntax can|>ot be 
a very useful ^d in on analysis of complex interview material. Syntax 
implies, sequence or the illation between the different parts of an utteratnce. 
There are fixed and mobile positions in this* struclur.e. If these positions^^re 

, . ^ it 

• • • \ .' 8 . 



.made us^ of in aji analysis of text by arranging,,the basic elements (concepts) 
'in clas^e.g that h^;^e ^ c^rtairi defined re latic^A So eacJi other, this facilitates 
con«iderab,iy th^ ^^rftjesseg o'f prodp.ci^g:sV'^eft=cea;'^iJ;t^ 
.the ^equen<;es dr^^^^duced manually »or j5,y.,|p.iiputei-'' -or whether syntac- 
tical relations ox- :^'5y?:^(6logical -relations ilVstipula^^ and used. Each 



^concept category , can 'tjjereby also be Related t.ot^k ^ the others b^means 
of <fte -condition? , that'are specified ^•a.cert'dirf&f^;fted analysis purpose/" 
The 'purpose' of .^jttrWlys is i's primari^Jy 'to estab^ which actions (with 
or without 4l^li"i;'iHy stated objects') are carried/ouf.^sy researchers 



J: 



- 7 - 



ERIC 



4. REPRESENTATION OF STATEMENTS . 

A formalization of spoken text naturally cannot take place independently of 
later stages in the planned analysis. Nor does it in any way replace cate- 
gories. Categories are namely this links between the theoretical anchorage 
pf the research problem and technical aspects of the content analysis. In 
principle a content analysis can be carried out in accordance v/ith the three 
following basic models: (1) the association model, ^which presents informa- 
tion in the form of statistical correlations between observable and non- 
observable variables, (2) the discourse model, which studies information 
defined by means of linguistic relationships and.presents these "relationships 
in denotations and connotations and (3) the communication model, which 
describes information b> means of process and control within a dynamic 
interaction system. The choice of model 3 includes models 1 and 2, 
since there are no adeqxiate mathematical models for the last (3) model, 
it is model 2 that is most appropriate, considering the interview strategy 
that has been used. The disc6urse model-describes extra- linguistic pheno- 
mena. It reproduces (is representative of) events wi. thin the source ^ in- 
formation (the researcher) and nominajs occuring in the discussion ('^dis- 
course**)' that refer to, separate or connect non- grammatical objects' or 
concepts (see Krippendorff, 1969, p. 102). « i 

The model presupposes that rules are drawn up. It should be pointed 
out that this kind of structurization is not possible without a considerable 
investment of energy, labor and time. Only when a^ highly structured ^ 
quantity of data exists can it be used in the practical research work and 
when we- wish at accelerated tempo to eKtract different types of information. 
The, c/reation of material characterized by a high degree of structurization , 
should be of .particular importance when it can be assumed that the mate- 
rial will be able to provide answers to future questions, an assumption 
which should be relevant in research contexts, \^ 

' The researcher's perception (description) and evaluation of the pro- 
t cess of formulating problems is of primary inter<est in this study, and 
consequently the theories ar;d techniques presented by Osgood (1956, 1959), 
Stone (1966) and Holsti (1969) are well-suited for an analysis of the interview 
material, smce it is on the individual researcher's subjective interpretation 
of a given situation and action that we wish to focus. The technique and 
program that are being developed are based on agent-action- goal-relation- 
ship and associated modifier s. In addition tp\this segmentation, »»themes'* 
are also extracted from the sentences, so that important information can 
be recovered or mediated in addition to the information that becom'es 



10. 



^.vailable by means of the basic paradigm. Figure 1 presents the different 
stages of the a.nalysis. 

4. 1 Directions for writing out spoken t^t 

* 

The. directions that were gi^en for the 'writing down of the interview mate- 
rial have not included |)honological transcription rules, but the importance* 
of an authentic recording of the atidio-taj>e material has been emphasized. 
All audible utterances have been written down, .which means that the 
speaker^s slips of the tongue, corrections and imcomplete sentences are 
included. 

4' 2 Marking of text 

The main problem in an analysis of verbal data for the purpose of obtaining 
information is that it must be possible to select the relevant material 
(concepts and concept rel^itionships) from a quantity 6f possibly relevant 
material. For this reason the parts of the text that are relevant for the 
analysis- should be marked. The object of the analysis is the statements 
ipade by the persons interviewed and therefore in the mai/king phase all 
interview questions, arguments and counterarguments from the interviewer 
are denoted *^hon- relevant** text. 

4. 3 ^Segmentation of text ^ * ' 

A collection of interview teb^ts or ^ny other texts can be extremely com- 
prehensive. Information that i^ to be extracted from a quantity of text can 
in addition be very -dispersed. Iri order to find relevant information* each 
itxdividuaL interview must be gone through from beginning to end. The 
postulation that there should be a structure in a text perhaps seems some- 
what trivial in this context, *but it is not superfluous. It is the permanent 
structure (established among o,ther things by tixe order of the interview 
questions) of the interview material that cari'lbe used for a divisioA of 'the 
large amount of text into manageable sections. It was considered unrealis- 
tic (and proved to be so) to treat each in-dividual interview as a unit when 
coding and for this reason the interview material was divided into seven 
question complexes. * ' , " • ' 

■ • V ■ 

4. 4 The ageht-action-object (AaO) paradigm . ^ 

Even if there is no technitjue for an '^objective analysis that reflects "all*' 
the dimensions of the language, it is nevertheless possible to make; use of 
certain general paradigms fdr an analysis, treatment and structurization pf 
verbal data so tli^ information becomes available. Dimensionality is a 



Directions for 
writing out spoken 
text 

- I 



Marking of 




Segmen&Hon 


text 






of text 






r 






AaO-paradigm 
- clauses 














p-^ 








I>evelopnient of rules 
for formalization 





Allotment of 
codes to text 



Supplementation 
of text 



Selc^ction of coders 
and choice of test 
criterion 



ilevisicm of 
rules 




Renewed coder v 
training 



Construction of 
' c gT8r<fr~f or- indc - 
pendent and depen* 
dent concepts 




Scaling of de- 






pendent concepts 












Design of < 
search logic 





i 

\ 





. 1 




» 


. 1 




Definition s>of 
profiles 






Formulation of 








hypotheses 








r 




\ 




Tabulation of 
distributions of 
frequencies 








1 












- 1 




SUtistscal 






Inference 




analysis 








I 












\ 



Figure 1, Flow-chart for designing a computer-based content analysis 



1'2 



' central concept in behavioral science contexts. Horst (1968/ p. 43) writes: 

"The importance of this concept as a starting p6int for all psychological as' 
well as 9ther scientifip investigation is, not generally recognized. " 

As has Been mentiojied earlier, every form of scienti/ic investigation pre- 

.supposes that the re searche remakes clear which aspectaar? to be mapped. 

.This in its turn leads to different' models and data matrices, in which rows 

usually represfent the measurement objects of the investigation. ColumnSr 

refer to dimensiqns or attributes in which the objects of the investigation 

(*'indivi^^ls") are t6'^be measured. When using psychological tests, assess- 




swers, one 



a matripes. If on the 



rs into 
The 
lat can 
" and 



btains 



ment scal-^T^oKauestionnaires with fixed alte, 
test values* that^an' l?e used directly for sett' 
other hand one permits individuals to formulate their answers as they them- 
selves' wish, some technique is needed to help transform the smswei 
scores. The development-of such an analysis tec^ique is planned, 
intended^ processing of the interview material is to lead to scores tl 
be, studies! by means of statistical models for both '*uni-dimensiona^] 
^^nfXulti-dimensionaP' analyses.. 

*Each Written or spoken text has a basic structure that consistjilof syn- 

1 

tactical uhits, which it should be^ossible to use in the construction of data 
matrices. Each syntactical unit (pHrages, clauses or sentences) coiisists 
in its turn of 'Vords'' that are,arrranged in ^accordance with the structural 
rules of tKe language. This relationship, together with the development of 
computers, has led to a growing interest in, recent year.s in the autimajtization 
of content analyses* The computer's processing speed has resxilted in / 
attempts also being made to develop algorithms with the help of which it ' 
should be*possible to identify ^relevant information^' as opposed to an iden- 
tification of J'wprds'^ as they occur in the text. (By algorithm is meant here 
a ryiechanical naethod of approacli for the transformation of utterances to 
unambiguous analytical units. ) For this purpose algorithmical codes must 
be developed,2^i. e. codes based on rules for converting source materistl to 
'"equivalent terms*^ * " / , ' 

The para.digna on which Osgood^'s ''Evaluative Assertion Analysis'^ 
^6od et al, 1956) is based is the AaO paradigm., Osgood presents a 
meth^L^or separating attitude objects from "common meanings,^^ Each 
text is tra^v^ormed into a succession of simple syntactical relationships. 
The text is foi^S-lized according to the following model: 

attitude object / Connector / evaluating term ^ - ^ ^ 

atittude object^ / connector / attitude object^ 



.3 



mi 



^ By the term attitude object, Osgood (1956, p.. 47) denotes a mes sage that 
'can be limited in a general Ixngxxistic context (^'common mea^iing materials* 
Attitude objects a^re primarily^noune, ^ w^ich are placed in an evaltiation 
^scale^ dependent pn predicative' statements and attributes that are ascribed 
to th^se attitude objects. in th'e,text. In this way^the jnetho'd is con'sidered-to 
measure attitudes or evaluations of certain phenomena. The method is rJbt 
fficient for our puri5oses, however, since it neither makes use of the 
ntire text^or exploits^the processing capacity of the computer* 

A further development of Osgood^'s method into a computer-based ana-- 
lysis is presented by Holsti {I9&9}. While the former method uses Jiouns 
and adjectives, t|ie latter also takes verbs into consideration as being of ^ 
importance for the evaluation oi-objects. Hoisti^'s technique also permits 
coding of the theme of the sentence 6^ g. negation, tens« and modality. The 
ba sis of this method is syntactical coding. The first step in Holsti's ana*" 
lysifir is that one determines (1) ^agent and modifier, (2) action and m9difier 
and (3) goal and modifier:. By means of numerical. codes, the agent (3) is 
liiiked with its qualifiers *by e.g. a verb (4) with its qualifiers to the goal 
(t) with its qualifiers. Thi^ linking can be illustrated with the following 
ctincret^ example ^rom our^material: 

We (I) can say that / the resovirce situation / 3 has / 4 the .whole tirrie'/ 4 
exercised pressure on / 4 problem definition / 7 problem limitation / 7. 

"I can say that'^ expr,esses the 'speaker's opinion on the continxiation amd • * 

need not be coded. Exercise pressure on cannot be separated, since the 

three words belong together, within the expression (although pressure is 

not a verb). In addition the verb expresses direct action ^towards a goal. 

The 3-4-7 relationship states the direction of the action. 

Holsti's analysis techxiique has been developed wii,th the written text 

as staiting-poiijt. Such texts are carefully-prepared works, while our' 

material is^poken text. Oifr starting-point has been Holstf's agent-action- 

object-goll paradigrn, but we wish to integrate Schank's psychological. 

arguments into our continued work»^*When applying Holsti^'s method on the 

Swedish material, it soon became a|>parent however, that we needed to 

exp^d the codes. Codes for attribi^es to agent-action- goal and codes for 

diffferent kinds of'qualifiers were r4late^ tb elernental concepts by means 

of a two-figure code system, in which the second figure states the respec- 

tive modifier. - 



4. 5 Ijeyelopment of rules for formalization of text 

A system for the recovery 6f information should ideally able to be used 
for every possibl^e selection and not just for a selection of such material as 

■ 14 • 



^ - 1 2 - 



appears to be relevant on a certain given occasion. The demand on the •• 
degree of strucmriza_ti6n of the data quantity gjrows with increasing quant'i- ' 

* ties of data, the frequency of use of the data quantity {search for information) 
and increasing' specification of problems. The concepts or statements 

• (conceptualizations) that exist in the data quantity must be predictable. A 
••computer -leased presentation of relevant information is based on predictable 
<Velatipn ships between concepts and predictable statements. A preliminary 

^ttempt at developing rules for an Analysis^of Concepts by Pkta-pr-ocessinjz 
(ANACONDA) -has been made (I, Bierschenk, 1974). Tflrial) codings have been 
Corned out^and intercoder agreement has been calculated (Berg; 1974, 
1. Bierschenk, 1974), Usjjgg ANACONDA, about i 6% of the interview mate- ' 
rial has been prepared io'r manualfprocessing, Sone computer programs 
for simjU^r proc'essings are already in existence. 

4. 6 Allot^ment of codes to text 

If we wish to^guaranteethafan. analysis of texts leads ^o recovery of infor- 
mation that^is relevant to' an investigation, we should choose for the ana- 
lysis a technique that makes use of syntax. This means ,th^t the original 
relationships that exist between concepts and statements are preserved. 

In the analysis each concept holding information that has a different '.i 
function in the clause than the other parts has been picked out. Each ''xmiV'f 
consisting of one or more words, is allotted a code. These t:odes are 
divided into the two basic units of the analysis mentioned earlier, with the " 
following four main categories: 30, 40, 50, 70 (not 60, since 50 and 60 are 
easily mistaken for each- other when data transcriptions are read). In 
principle, these figures follow Holsti (1969). The first figure corresponds 
to the following four categories: agent, action, .object (means, goal). The 
second figure-states the independent or theMependcnt concept. The depen- 
dent concepts that Function as different kinds of qualifiers are allotted 
figures that ^re immediately dominated by the independent concept: Agent, 
for example, is given the figure 30 (0 = independent concept) and an ^adjec- 
tive that describes the agent is given the* figure 32. In addition to the indi- 
vidual parts of the clause, the statement's tense, mood etc, (seeppl4-l5) 
are coded. In addition there are a number of codes ior overall structures 
that the coding of "separate units cannot give. A theme has been defined as 
^a syntactic unit which^consists of either (1) alnain clause or (2) main 
clause + one ot more subordinate clauses> On the basis of ANACONDA a 
test material has been prepa'jpfid in accordance with the format presented 
on PP 1 4-1 5 ^ ^r^"" 



15 



13.. 



(0 
0) 

O 

o 



NO' 

in 
1^ 
^. 

1^ 

M 

1^ 

rH 

o 

nO 
00 

vO 



O' 
CO 



O 
.A. 
M 

CO 

CT) ' ro 



<C0 

in 



o 



O O ' 



o 



o 



o 
in- 



CO 

in 



o 

CO 

in 



O O CO 

^/ in in 



o 
in 
in 

CO 

in 



o 



o 



•f-J 

o 



O 

X' 



.a 



u 

o 
u 



> 

























4J 



o 



O 

o 



^1 



> 
u 



4> 



o 

o 

(0 



^ 03 



0} i 
(0 

o ? 

(X. a 

CJ 
0) 

<u 



X 



O 



CO 
(0 



X 

CJ 



(0 



o 

U 
(0 



CJ ' 
n5 



X 



^10 
CO 



hi- 
> 

Q 



Cl4 



c - 



o 
u 



^1 

to ^ 
> » 



X 



(0 

'u 

to 



^1 
o 



0) 

to 



u 
o 



o 

> 



0) 



•4-> 
0) 



u 

g 

El 

> 



I 

r- < 

o 

X 

CO 

1—* 

-TO 



c 
o 

o 

CO 



4k^ 

o 

CO 



0. 
























CO 








+> 








(U 


















•a 






> 
















:5 






typ 


CQ 
4-1 


rna 












> u 




'a 

<u 


CO 


>^ 










g 


o 






o 


X 




to 


CO 


CO 


CO 



CO 

8 / ■ . . ■ * • 

, o in ' 

a - : 

o o r-icsj con^'invO t-^ CO 6v_ Or-<<Mco ^ invot^ oooo 

.ii r 

ERICS ^2 . . 

>H f-M o 



CO 

u 

a 

CJ 

c 
o 

4J 

X 
H 



{^3 



- 14 



- The maximal unit is a s'entpnce, which can^'e divided into clauses of 
different dfegrees. A sentence is complete as soon as.it contains the two 
'main constituents, subject and verb (phrase). This analysis^works with the* 
sequence of clauses. Therefore'' the main clause (the first column) can also 
♦ be called the first clause. In the example (figure 2) the subject in the first 
clause has a postpositive qualification in the. form of a whole clause, "which » 
must therefore be introduced in- some other columryf* 'This is done by means 
oi the figure 3'in column 68,. which' states that the ujhole sentence^'s first 
subprdinate clause is to be found in coluinns 69 and 70. The first cl^^use 
continues in columns 66 and 67,*which is stated in column 71, The object 
in the clause, is a th.at-clause and since it is ''cJause-worthy'* it must be 
placed directly in the next empty colunrm (3), which is stated in colAimn 74. 
Thus, the object of the "main clau^cl' consists of three clauses. 

Since this analysis is to cdfpture the function of each individual unit in 
a clause, it is sometimes nec'essary to havt-' double coding m the form of • •> 
clause subordinators. Here a "that'' introduces a new clause, but has no 
other function. ''Which", on the other hand, has a function. In the first and 
third cases, 'Vhich*' changes function from qualifier "to object, in the se- 
cond case from^ qualifier to subject. ■ * - 

Figure. 3 on p 1 7 pteoents the flow chart for a conciputer search for the text 
example shown .in F^ure 2^ Only such structures as can be stated e>:ph«.itly 
can be delegated to a computer-based Vy^t^m. Each desired facet cannot 
be stated in advance, nor be extracted from a text material. For this reason, 
we have, in addition to the segmentation discussed, also devised codes for 
the main theme, so that the fundamental information, v/hich cannot be re- 
gained or'mediated by means of this clause code, does not get lost- 



CODING SCHEDULE 



Variable 
Identification 

Interviewed per^n No. 
Qtiestion No. 
Sentende I^o, 
Concept No. in sentence 



Pos. 

01-40 
01 -n^ 
0001 -n 
01 -n 



Comments 



. Clause codes ^ 

11 Source of statement 

♦ 



Pos . 

1 , *n 



Com me nts 

1 7^ the speaker him- 
self ' 

Z - someone other 
than the 'speaker 
h;j s do the 
statement 



Negation 



. Tense 



' 1.2,3 



Mhpd 

CtMidition 
Cause 
Concession 
Result or intention 



1 



.1 
1 



contrast 



i 



1,2,3,4 

1.-2 
1,2 
1,2 . 
1,2. 

,2 • 



1 
1 
1 
1 

2-9 



^ V 
Comparison } 
Question 

Assuxijption J 

Volition ' 

Number of cards (lines) with 
word units in chich t^ere is 
not room for the xmit on one 
line 

Liter relation between c^rds 1-9 
(lines) containing tbfe same 
word .unit 

Text ' " 

Summary: main categories 
' Code Cont'^nt 



30 
31' 
32 



Agents 

Qual^er of agent (before) 
Description of agent 



2 = 



3 P 



1 =^not, ito, none, no-' 

•body, .nothing . 

2 = hardly or the like 

3 = neither - nor 

1 = present time, refers 
to when statement is 
made 

past time frojn the 
occasion whe*n state- 
ment is made 
future time from the 
^ iJccasioA when state- 
ment is made 

1 = indi-cative 

2 = imperative^ * 

3 = co^j^notive'^ ' ^ 

4 = modkr apciliaries 

t = the cojiditional cl 
2 = the coiroUary 

1 =/the causal clause 

2 = corollary 

r 

1 = the concession 

2 = the restriction . 

1 = "the main clause" 

2 = result/intention 

clause. 

\ 

1 = either, k^mittedly, 

on the one^h^d 

2 = or, but, on 

the other hand 

1 - occurrence 
1 = occurrence 
1 = occurrence 
1 = occurrence 



l^se 



18 



16 - 



\ 



\ 



\ 



66-67 
68 



33 

• 40 

.42 
43 

• 44 
45 
46 
50 
51 ' 

• 52 
53 

71 
72 
73 

f 

t 

Clause codes 



Qualifier of agent (after) 
Verb, 'veth phrase 
Copula -v^erb 

Clause adverbial (clause modifier) 
Time^adverbial, (when?) . 
Place adverbial (where?) ^ 
Manner-degree adverbial (how?) 

Adverb or other word' stating comparison, contrast 
Direct object 

Qualifier of direct object (before) 
•Description of direct object 
Qualifier of direct object (after) 
Object of goal 

Qualifier of object of goal (before) 
Description of object of .goal 
Qualifier of object of goal (after) 



etc. 



Main clause columns 

Reference column for sub 
ordinate clause columns 

69 -70 . First subordinate clausfe 



Pos. 



2,S, 4 5 



^ 



column^. 



Conime nts 



two- figure codes 



tv/o- figure <!:odes 



7] t^^-eferencc column for other 
. subordinate clause columns 

or a looping back to* main 
clause 



1,3,4,5 



" 74 

75-76 
77 

78-79 
80 



Seg^d^ subordinate clause 
columns 

Reference column for other 
subordinate clause columns 
or a looping back to main 
clause 

Third subordinate clause- 
columns 

Re/erence column for other 
subordinate clause columns or 
a looping back to main clause 

Fourth subordinate clause 
columns 

Reference column for other 
subordiilate clause columns 
or a looping back to main 
•clause * • 



two- figure codes 



1,2,4,5 



two- figure 'codes 



1, 2, 3.5 



two-figure codes. 



1. 2,3.4 



ERIC 



- 17 - 



3 '{STAfiT] 

\ 



if 



12 



30 


* • 


• 33:^ 




33 


2 






SO 




\ 


53 






30 










' 1 



3a 








40 


i 






SO 


• 






53 






53 


























t 












50 








S3 


J 






^ 53 








'f 








50 






' ■ \ 


If' 


• 






40 



^ . Figur 3 « Search paradigm 

ERIC 



4. 1 Supplementation of text . • ^ 

So Tie sentences can be fragments that cann9t be" supplemented into inde- 
, pendent conceptualizations, i.e. agent-a^tion"^bject (AaO). In these cases 
in which the coder does not un.dtirst;i.'nd dti utterance, it is to be deleted. 
The utterance must be completely compr efiensibie, which m6an& that 
different types of relation words (e.g. pronouns and adverbs) must.be . 
supplemented^o their right meaning (referent) in the context.. Supplements 
are placed in parenthesis, so that Jhe analysis does not l|)se track of what ' 
the person interviewed in. fact says. When choosing th^^ words to be used m 
tho s'tpplements, chose already .Vsed^by the pi-rson interviewed aive taken 
firsi, If the context does not^n^^ike this uniposs}bie. 

Some- deletions are rr.ide when the ni<iterial if segmented. When de- 
/in?ng a sentence, one cannot a Kva ys assume that each sc^ntence in tHc* te/ft 
Via s bt.en concluded witn ^ fuilstop. A unit betueen twc fullstops can c<-)r.sist 
several sentiences, either se^parated o% meai;s of pauses that art rrxoi-kexi 
ir the transcription by a series of dots or fragm'-nts v/hich can be supple- 
mented and made into complete sentences. Another v/ay of marking the 
beginning and end is oy linking v/ith "and ' or other conjunctions, v/hi,c> in 
this analysis are taken as being the first unit in a sentence ^nd coded as 
'^having no*meaning'^ (This does not apply to an *>,nd'^ that links two ob- 
jects in the same clause. ) In the cases in v*bach otJvioas corrections are 
made by the person interviev/ed, the utterance that is immediately corrected 
is not coded. 



4. 8 Selection of coders and choice of criteria for inter coder agreement 

The computer-based processing of text according to AyN'ACONpA is based 
on pattern recognition and the treatment of manually ins.erted clause codes* 
The asse"ssments which e.g. tv/o independent assessors give for an ''infor- 
mation unit'' with respect to the same category can best be considered as 
parallel **tests**, which at the^same tim*e assum^es that both assessors have* 
''identical'^ frames .of reference or systems of relationships. An examina- 
tion of the precision of the coding done by the assesbors is one of the pre- 
requisites if we are to be able to demonstrate the objectivitv m content ^ 
analytical proces siiig of verbal material: The "reliability^^ of the assessors' 
coding is above all a problem of communication. 2. e. the precision of the 
coding is dependent onUhe conimunicabiUty of'the criteri?^ stated in ANA- , 
CONDA. To summarize, it can be said that the reliability of the coding is 
*a fiinction of (\ ) the unequivocality of the mfor m*atior^ un:ts , (Z) the un- 
equivocality of manual and category functions and f3) isse'ssors' special ■ 



ERIC 



21 



frame of reference, e. g. loiowledge'of Unguis|;^« and knowledge of the^' 
subject. The assessors fo^rm the measuring instrument in the analysis, I^^ 
'addition the unequivocality of the info^rmation c^ontained in the -basic units 
influences the reliability. to a^arge deg2;ec'. But since it is very difficult, 
if not impossible, to get the entire process unjler control, .the pqssibility • 
of increasing the Reliability is usuallf limited to manipulation with the 
assessors and/or manual.. For this feason it fs more Justifiable'to use the 
term '^intercoder agreement^',' at .least as long as the allotment of codes 
cannot take place mechanically. •» ^ 

If we have estirtiated the ihtert:oder agreement, irrespective of which 
method of estimation has been used, it is usually very diffiaiit to judge 
^whether the calculated index value can be Considered satisfactory^- It can 
be very difficult to determine a reasonable ieyfel of a gr eemerit, since there 
is no simple solution to this problem. Moreover, it is only possible to 
decide what can "be considered a satisfactory •^f-eliable'* coding within the 
frame of a given problem. , " * ' . 

Starting from a coded material of this kind, analyses can be carried 
oijt that are based on the discourse model (Krippendorff, 1969^ p. 80). By 
means of this model, it bbcomes'ppssible' to represent events or ideas C 
within the source of infornnation (the researcher). The use of the model 
presupposes that independent coders can aHot codes to the text with a sa^- 
factory level of agreem'ent. As the .criterion w^ have stipulated 80?4 agree- 
ment. "? 

f Two methods of assessment were applied.^ The first method (.CTsgo^d 
et al, 1956, ^p, 57) states the proportional agreement. Segment marki;igs; 
supp^rhentations, ceding, of subject, object, verb-.and modifiers v/er^ 
assessed according to this method, psgood's technique was applied^ri- 
manly with the purpose of making it possibTe to compare our results v/ith- 
those presented by Osgood. - • , . ' 

The second method is based on the binomial divisjon hypothesis. By 
means of the binomial test /Siegt^i, 1956, p. 40)^ the extent to which the 
criterion (80% agreement) -^-^s fulfilled w.iS studied. An observed value 
that is less than the test v uue of z-i, 64 st.:-tes that the mtercoder agree- 
ment does not fulfill the criterion. * > . , ^ - 

The uerson$ v/no have coded the .nterview material h<ive v.dthin-the ' 
framework of the SOrl-project deve^.>ped AXACOKDA. In additiop an attempt 
was tnade to tram tv/o additioa^ai cod/-rs. This attempt hTTs for various 
reasons, however, been abandaned. The pe;>e^ns who were' to code one 
part of the.interviev. matericii \Cere traint-^ for tiu-o^jA 15 hours in the use 



of the first version (^f ANACONDA. They practised (1) separating relevant 
from nort^relevdnt text material, (2) segn^ng texynto meaningful units 
and (3) identifying syntactical relationships. In addition they practised^4) 
writing word units in sequence-'iniiccordance with a pattern, (5) differen- . 
tiating Ipetween types of clause accSrding to definition and (6) interpreting' 
analysis -units- ^ndarranging codes (seel. Biersch^nk, m4, p. 16). • 

».By nc/w the developmental work with ANACONDA has reached the point 
at^ which, according to Hgure l) we have to estimate the intercoder Agree- 
ment. Some of -the empirical results of the evaluation will be prisented next 
But the steps below the "criterion of intercoder agreement- in Figure 1 will 
not be.pursued further here, instead these steps will be discussed briefly 
in Chapter 6. • 



23 



ERIC 



. • . .-21- • . 

.'5, INTERCODER AGREEMENT: SOME EMPIRICAL RESULTS 

y 5. 1 Intercoder agreement * ^ ^ ^ " 

If we are to be able to develop a technique for a computer-based, content 
analysis, it will be necessa}iy:J:or us^o create a system of rules that two 
or more lAdepiendent coders can U6e with a high degree pf mutual agreement. 
Berg (1974) calculated the agreement between two independent coders. This , 
scrutiny concerned (1) supplementation and deletion, (2) segmentation and 
(3) allotment of codes. i 

By means of a random table, four interview subjects (31, 2, 40 and 33) 
were picked out from the intervj^ewed sample of researchers. From the 
respective interviews,, four interview questions (5, b/V^nd 8) conx:erning 
information and documentation have been chosen. It can be assumed that 
the information that v/ill be extracted from the text will be relatively con- 
crete and consequently easy to interpret. Th.s should be an advantage in 
tlie development of a new technique. 

The interview question were to be coded in their entirety, so that the 
context of the discussion could be used in suppleme>ltation. Spreading the 
selection of text over the entire text or overraU tlie subjects has been coi^- 

* 4 

sidered an unsuitable method of procedure. 

The intercoder agreement was examined r'f^gard to 

1. segmentation of concepts . A check is mad^ct of whether both coders - 
have supplemented and deleted identical words, 

2. segmentation of clauses . A check is made of whether the coders have 
i*<ientical sentences.^ 

3. allotment of codes to concep ts. A check is made of whether both coders 
have allotted identical codes to one and the same concept. 

I 4,* allotm^t of codes to thei^ies. A check is made of whether both coders 
; ^ haVe alloted identical codes to one and the same theme in'a sentence. 

^ All the" comparisons are of the same typo i, e, either there is agreement or 
not, "The number of common assessments has been noted. In addition the .> 
total number of assessment and the'number js^f assessments each coder 
has made separately have been calculated, A detailed .scrutinization and 
conjprehensive documentation may be found in Ber^^ (19?4), Here, how- 
^ ever, only a sumnnarizing table will be presented, with the values for * 
points 1 to 4 above. The values have been taken from Berg {1^14, p. 30) ? 
and will be presented in reorganised form. 



2A. 



22 



T^ble_l.. Summary of intercoder agreement in applying ANACONDA 



Steibs in the 




Interview 


■n e 7* Q nn \^r\ 






^na lysis 




31 


2 


40* 


33 


^ iSegmentation of 


z 


3. 92 


2.20 


■• -.58 


— 

3.21 


concepts (1) 


i 


• 88 


• 86- 


. 82 


" . 86 


\ ' ' 


N 


r 7 7 


i U7 O 




i'Zbb 


Segmentation of 


Z 


2.82 


2. 64 • 


.67 ■ 


-2,76 


clauses ^2) 


P 




A' 






r 




Q X 
. 7 J , 


. 7 ^ 


Q A 


• 


N 


165 


227 


47 


246 


Allotment of cpdes 


z 


7.64 


9M2 


1.16 


8.51' 


$ concepts (3) 


i 


• . 91 , 


.92 






i 


N 


• 841 


1089 


222 ' 


1190 


Allotment of codes 


z 


7.33 


■ 5.51 


1.40 


4.37 


to themes: source, 


i 


.98 


.93 


.93 


-.93 


vtime. mode 


N 

IN 




■^Q 7 
■37 ' . 




y! O O ♦ 
4^^ 


Segmentation of 


z 


-9.89 


-.1 3. 08 


-4.71 " 


-17.60 


concepts before 


i 


.77 


.76 


.76 


.74 


check on com- 


N 


1013 • 


1377 


27 2 


1673 


parable text 












'Allotment of codes 


z 


-2.47 . 


-4.^52 


-6.23 


-10.40 


to concepts before 


i 


.83 


. 82 


.73 


.78 


check on com- 


N 


992 


1328 


283 


1549 


p^Xable concepts 













z tesC^T^luq^ binomial test 
p prolSibility; p <' , QSj^Wk ^ ^ ' 
tl^iat t^iiScu^riterion . 6o has ik^ 
been achieved . 
i Osgood^'s index for agreement 
N total number of assessments 



'^IP 40 has given oral commeixts to 
question 5. Questions 6 and 7 
were answered by filling in a 
questionnaire, while the IP 
did not comment on question 8* 



The checks of the intercoder agreement in the steps of the analysis carried 
out so far show that segmentation -can done with a satisfactorily higji 
level o|*agreement. As Table 1 shows, Osgood^'s index for agreement is 
^'between .74 and .98. Spiegelman, Terwilliger & Fearing (1953, p; 175) 
.give as the minimum requirement an inde^c yalue t^at is eq\iai to or greater 
,than ^75, irrespective of the method by wKi^ch the intercoder agreement ha's ^ 
/been estimated, flttgood (1956, 59)' himself reports index values of. 
^between .64 and .88. Our resulpis by comparison very satisfactory, since 
^ the analysis this report is deali^ftg with is much more detailed and com- 
, prehensive. In addition the interview material contains for natural reasons 
I greater variations, while at the ^arxie time it is less complete than Osgood^s 
.printed material. 



The binomial'test shows, however, thatthe critical value .80 could 
not be. established in every casie. As'is shown in Table 1, neither the *'seg- 
^ mentation of .concepts before a che-ck on comparable text^' nor the "allotment 
of codes to^ concepts before a' check on comparable concepts^' has resulted 
in satisfactory vahiVs, This is caused by.thejack of unequivocal rule^. If/ 
for example, one coder uses the term "reseacrcher*' while another describes 
the same person as a ''behaviorist'^ this leads to differences in Supplemen- 
tation. Tlus difference can,, however, be nullified by e. g. appropriate con- 
struction of^regi sters and facetting. All the supplementations ar^ mairke^*. 
in pkrenthesis, which rnakes it possible for us to analy$;e the materJ^il both 
wij;h and without supplementations and thus investigate tiie extent to 'which 
this leads to different results. 

The index values reported above the line are comparable with the re- 
suits tliat we would have got by limiring concepts in written text. As .can be 
seen from Table 1, the agreement ib very good, though -with the oxce^^ti'un 
of "Segmentation of clauses*' in interview No. ^33. This is probably a result 
oi there being a large number of unsvipplem^ffted clauses (see Berg^l^974, 
p. 23), • ' • ~ 

Attributes and adverbs have bbviously 'caused most of the deviations m 
the coding. The agreement for, attributes is admittedly over 80f; but sorm? 
of the deviations could be explained by the confusion that* has occurred bel\ 
ween the two categorfes. Thus, the coding of e.g. "Researcher A in Malmb^* 
has partly been aa.an adverb of place ''in Maimb" and partly as a post- 
positive attribute. Inaddition there has been confusion betw^n adverbs of 
-Jtime and degree. In the clause read daily'^ the word Maily'' has been 
coded both as a statement of time (adverb of time) and as a statei^rj^ent of 
frequency (ad^verb of degree). Concerning the examples presented here, 
the rules will be improved. 

5. Z Control of punch cards . " ^ • 

♦ * * 

It is important that the text material that is to form the basis for the further 

development».of ANACONDA is faultless/ Other^A/i se it would be very diffi- 

cult if not impossible to determine whether a 'fault is caused by incorrect ^ 

coding^or by some deficiency in the test material. For tlois reason. the text 

material transferred to punch cards has been checked both for faults 

despite correct coding and for faults resulting from incorrect coding. A 

detailed examination has been made and documented by I. Bierschenk (1974). 

The test material conriprises a-bout 37, 000 pui^ch c^rds. The punching 

yvas carried out by the punching machine operator at the Department of 

Educational and Psychological Research, Malm5 School of Education. A 



selection of punch cards (10%) was handed to the Data Processing Centre 
for Research and Higher Education in Lund* The pimchings were then 
examined for (1) identification faults (ip, no, question no, sentence^ no', 
word no), (2) theme (source, negation, tense, case, other clause themes), 
(3) text (spelling, parenthesis, other text), (4) content (concepts, clause . 
column). / i 

The result of this examination is presented in cpndensed form in 
Table 2. (For more detailed information, see L Bierschenk, 1974, p.\31). 

Table 2. Punching and control -punching. Observed and relative frequency 
: calculat-ed on 70,260 punches 



Category 


. Specification 


T Pvmching 

f '% 


, Control -punching 
f % 


1 . 


Identification , 


.7 ~ 


.01 


4 


.01 ■ 


2 


Theme 


3 


, 00 


8 


.01 


3 


Text V 


26 


.04 


102 


.-.14 


4 • . 


Cpntent 


22 


.03 


32 


>.05 


2 *- 




5% 


.08 


• 146 


.21 



From Table 2 it emerges that the ccaitrol-p&chiAgs have been carried out 

^less well than the original punchings^ The similarities are greatest within 

categories T and 2, white the' differences are greatest within category 3» 

This can be explained by th^ fact* that numerical codes (cat. 1 and '2) are 

more common for the machine jpunching^pperator s and pcpnf less frequently 

in this material. In addition there is a system in the th^eme codes. Source, 

tense and case are always punched, while negation and other clause themes 

are only punched when*they occur. But category 4 also contains numerical 

punching. Incorrect punching has serious consequences if e.g. a verb is 

placed in some noun* (object,, subject) -category. Moreover, it is an extremely 

time-consuming and .difficult job to check all the concepts included in each 

/ • 

respective code. 

* ^ *' 

Mistakes in category 3 mean among other. things that the parenthesis 
sign has been jaeglected. This sign is imp^tant, howevei*, when we wish 
•to keep apart actual statements and implied or imagined ones. 

In order t^at we should be able to form an idea of the consequences » / 
of the content codes throughout the entire material, all the material was 
corrected. Thereby it became j^ssible to draw up a protocol with all errors. 
The text of all forty interview persons was examined on, questions 5, 6, 7 
and 8, card by card, and every error was registered. The results of the 
examination are presented in condensed form in TablQ*3. (For xnore d^- 




Table. 3, Pwiching and coding errors in examination of th^ total punched 
material: Observed aTid relative frequelicy calculated on' 
702,600 punches 



Category 


Spe cif iiea^n 


Punching ' 


Codin 


g 


2 








/f 


% 


f 


% ^ 


f ' 




1 


' - V ' . 












« 


c 


Theme 


6 


. 00 


d3 


, 00 


59 


.00 


3 


" Text 


■182 


. 03 


34 


. 00 


216 


. 00 


4 


Content 


88 


. 00 


99 


. 00 


187 


.03 


2 




276 


.04 


186 


. 03* 


462 


.07. 



















As Qan be seen in Table 3, the g;:eater part of the errors depend on the 
punching. Corrections within category 4 covariate with alterations in the 
text. But siii.ce the examination made showed that we in future only need 
calculate with approximately .04% punching errors and * 03%* coding errors 
they are with regard to the clause columns a negligible factor. 




V 



28 



26 



6. STRUCTURED REGISTERS 



ERIC 



Before a computerized analysis of information can be realized, the researcher 
must state his theorlstical standpoint, i, e. define his concepts. It is necessary 
to establish, in. advance which aspects of the material cire to be paid attention. 
Categori|s form the link between the theoretical anchorage of the research 
problem and the technical aspects of the arialysis^of information.' By means 
of registers, intended to be used for content analytic treatment of texts, - 
natural language is transformed into formalized language. This transforma- 
tion assumes a purpose or a theory. The questions that have guided tiie 
interview^ with researchers at the departments of educational research are 
♦ based on the assumption that the initial phase of the research is influenced 
by ^ 

1. the • qualitie,s of individual researchers and the social- system within 
which they have to act and re^ct - constraints • ' ' 

2. the interests, motivation and role-behavi'ors of individual researchers 
- intervenilSfig variables 

3. ^ recommendations for changes or improvements of e.g. research 

planning - research policy 

Our hope is that we shall be able"to answer at least three que^Jions:' 

1. Which criteria guide the researcher's approach during the initial phase- 
of the research, i.e. what values do the r esearchers-have ? 

2. What actions do the researchers take during the initial phase of the 
research and how are these evaluated? 

3. What steering mechanisms, influence the development of the initial 
research phase? 

But what situations and actions are to be eixtxacted from the present mate- 
rial and how are they to be evaluated? ThJse questions must be answered 
in connection with the construction of structured registers. In principle 
each individual concept in the text can form its own category. 

* Irrespective of how the registers are built up, it should be of great 
help if e.g, proper nouns aAd placo^ references form separate facets. If it 
should prove to be desirable to have facetted registers, the analysis must 
begin with the recognition of individual concept patterns in the text according 
to a register. The analysis technique that is to be developed for the inter- 
view material demands at least four different registers: (I) Independent 
concepts (subject and object terms), (2) Dependent concepts (adjectives/ 
attributes), (3) Actions/kopula (verbs), (4) Dependent concepts (adverbs). 

With the help of the computer, lists are produced of these parts of 
Speech. Registers are then compiled on the bases of these lists.' Criteria t 



for the se^ction'of impdlrfcant concepts already exist to some extent in the 
form of the question complexes that are dealt with in the interview. By 
means of the KWIC program, the registers can be aciapted very closely to 
^the verbal behavior of the researchers, • ^ 

. In constructing registers 2-4, Osgood's semantic differentials were 
used. Each' term' is defined with regard to (1) evaluation, (2) activity and 
(3]j potency* The assessment is made according to sevenrpoint and bipolar 
.Tscales with the respective pairs of adjectives {*1) negative/positive, 
^) jjafisive /active and (3) weak/strong. The advantage of this scaling tech- 
' nigue i%; tVat it is simple \o use and chat we can study three independent 
vjp^xjimensioj^. By; meang of the evaluation dimension, the extent to which the 
researcher' as'^esses different aspects as good or bad can be studied. The 
activity dimension measurers the extent to which the researcher considers 
^ that a pa^rticular 'aspect has influenced the development of project outlines 
onbehavi^^r during the initial phase. of the research process. The potency 
• dimen^sion me^,sures th^ researcher^s sensitivity or responsiveness. 
Dimensions *two and three^^together express dynamics. 

It is assumed that assessors can make reliable and valid assessments 
of the tendency and intensity 'of a statement. An initial analysis for deter- 
mining the reliability of the scaling of adjective ai^ yerb has been carried 
out. The results are presented, in Table 4: \J ^ 

liable 4, Intraclass correlation in the scaling of modifiers and actions 



r Class' - 


PaVrs of adjectives 




1 

I . 


Neg^ive/ 


Passive/ 


Weak/ 


f 


positive 


active 


strong 


} 

1 Modifie'r^ 


.90 


J 98 


1. 00 


1' Actions 


' 


;97 


.98 



As is shown in Table 4, the words v/itmn each group are assessed very 
'simitar% An evaluation of t lie inter-nssessor agreement has shown, how- 
ever, Chat the assessors have differing ideas \n the evaluation of both 
modifiers and actions. One ex,-imp]e of this dissimilarity in the assessment 
cah be given -with the modifier "psychological". Depending on whether the 
w.ord stands.before or after its main word, the assessment could be made 
in different ways. If "psychological methods" are named in the text, the 
word "psychological" can be treated as an adjective. If on the other hand 
the text'menti'ons "methods in psycWog^", the word is descriptive and in 
the second case explanatory. The wor^"psychological" lias been assessed 
in the way shown in Table 5: f 

30 



Table 5 .^ .Assessment scores of the modifier: psirchological 



Assessor 


Dimension 

1 z\ 


* 


1 ' ^ 


■ 5 5 ■ 


5 


2 


6 6 


6 


3 


4 4 


6 



The example given m Table S reflects the different references made by the , 

assessors to the word "psychological". Even though the agreement in the 

assessment is low, the average nevertheless appears to^be a good approxi- 

mation of the sense in which the word is commonly us^d. The result s^hows 

that the wprd '^psychological'' as a descriptive term is not wholly neutral. 

The fact that the agreement between the assessors is low can be explained^ 

partly by their different backgrounds, partly by the circumstance tha^ the • 

assessment was carried out without any special instrucfings having been 

given. Thus, the v/ay in which a person himself chooses an expression is 

our main concern. One could admittedly claim that (1) methods in psycho- \ 

logy and (ZT) psychological methods are equivalent forms o£ expression for 

the. same conceptualization. But the coding of the text must permit us to 

state content and not only position. The content is not determined until 

further $teps have been carried out, i, e. the scaling, I 

* * ^ 

6. 1 Facets 

A content analysis presupposes categories or facets. Concepts, idiomatic ' 
expressions or phrases represent thereby a variable according to a certain 
given theory. The fundamental means of approach in a content analysis 
to identify these '^signs**, so that they can be coded according to the category 
to which they belong (Stone, 1966, pp, 170-186), In a content analysis, 
however, it is seldom so that the analysis concerns only one ^'category'*, but 
instead the interest concerns the relation between categories. The cate- 
gories form the semantic and empirical anchorage of the analysis, since 
individual concepts are defined according to the resear chert's conceptual 
lization, e^^. category systems. 

' Registers for content analytical processing function as links beWeen 
the natural language ajid a more formal, theory-oriented language. At tiie 
present stag^ of the /analysis it is difficult to establish how the information 
made available with the help ^of the coding system presented and informa- 
tion such as specific theory- oriented concept groups should best be used. 
The building up of structured registers should, however, aim at 

1. the possibility of being able to test several theories 

2» cle^x 4iVisions of relevant facets and 

3. if possible, the statement of the relations ^between indi\ddual facets, 

, e.g. proper noun & institution, proper noun & role, etc, 

31 



Thus the construction.^^ our registers involves both a classification of th 
material into evaluatiii^catepries aiid thematic classifications. If we ar 
to be able to carry out the analysis successfully, it seems at present as ii 
we.must reach a ^^isi!oh on some form of facetting. 



10 



- 30 - 

7. REFERENCES , ^ 



Berg, M. Reliabilitetspr5vning .av en metod f6r ini^ehillsanalys av iAieTvjxx^ 
text. /Reliability testing of a method of coitten^t analysis applied to 
interview texts. / Testkonstruktion och testdata , No. 26, 1974. ^% 

Bierschenk, B/ Perception, strukturering och precisering av pedagogiska 
och psykologiska forskningsproblenn p5 pedagogiska institutipner i Sve- 
rige. /Perception, structuring and definition of educational*and psycho- 
logical research problems at departments of education in Sv^eden. / 
Pedagogisk- psykologiska problem,, Np. 254, 1974. ^ 

Bierschenk, I. Konstruktion av ett Jre gel system f5r en datorbaserad iitne- 
h^lsanalys av interv^jutext: Preliminkr manual och. nigra utprdvnmgs- 
restiltat. /Construction of rules for a computer-based content analysis 
of interview texts: preliminary manual and some evaluation data./ 
Testkonstruktion och testdata . No. 25, 1974. 

Bobrow, D. G. Syntactic theories in computer implementation. In: Borko, Ti. 
Automated language processing . New York: Wiley, 1967, Pp. 215-251. 

Gerbner, g/, Holsti, O.R. , Krippendorf, K. , Paisley, W. J.. & Stone/^P.J. 
(Eds. ) The analysis of communication content. Developments in 
scientific theories and computer techni.ques . New York: Wiley^ 1969. 

Holsti, O. R. Content analysis for the social sciences and himianities. 
Reading: Addison- Wesley, 1969. ' 

Horst,^ P. Personality: The measurement of 'dimensions. San Francisco: 
Jossey-Bass, 1968. 

Krippendorf, K. Models of messages: Three proto-types. In: Gerbner et al. 
The analysis of commvmication contents. Developments in scientific 
theories and computer techniques > New York: Wiley, 1969. Pp. 69-106. 

'Miller, G.A. Kopimunikation och psykologi. /Communication and psychology. / 
Stockholm: Beckmans, 1969. \ * 

Osgood, Ch. E. , Saporta, S* &c Nunnally, J. C. Evaluative assertion analysis. 
Lit era, 1956, 2, 47-10?. 

Schank, R* C» Conceptual dependency: A theory of natural language under- 
standing. Cognitive Psychology , 1972, 2 (^)' 552-631. 

Schank, R. C. & Colby, K. M. (Eds.) Computer models of thought and 
language. San Francisco: Freeman, 1973. 

Siegel, S. NOnparametric statistics for the behavioral sciences. New York: 
McGraw-Hill, 1956. , . , 

Spiegelman, M. , Teru'illiger, C. & Feari^ig, F. The reliability of agree- 
ment in content analysis, J. soc. Psychol. , 1953, 37, 189-203. 

Stone, P.J. The general inquirer: A computer approach to content analysis . 
Cambridge, Mass.: The MIT-Press, 1,966'. 



ERJC . , • 3^ ^ 



Department of 
Educational and 
Psychological Research 

School of Education 
MalmO, Sweden 



m 
o 

o 
u 
c 
o 



o 



1 

u 

CO 

M 
< 



34 



c *3 o 

-0^3 0 

<t d ^ 

5 5 © 
c « 
c .2 ^ 

§ ^ CO 

o ?^ 5 
-US'::: 

:> C 

S- ? 5 

'J 



2^ C O O 

O ^ 

C (O 

^ c o 

^ ^ 

? O 



5 

> o 
n o 

n ^ ^ 
^ c ^ 



3> 



o 

o 



W 



o ^ o 
« o 

*-* M 

O 



< 2 



o I 

si 

- o a 5^ 

^ J: 

c; » 
o c 't? vi r 

+J O w > 

> o c • 
^ < .1 o 



r5 C 



£Q c o 



q o 



o t^. 



*j TJ ^ 

2 «^ O <5 



o V-/ ^ 

» CO ^ 



CO. 

o 
o 



to 
C 



^1^ 



4J C 

, o 

>5 f4 



^ "Pi 
> X 



CO ^ in 



